Add PyCapsule Type Support and Type Hint Enhancements for AggregateUDF in DataFusion Python Bindings #1277
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
udaf
function #1237Rationale for this change
The current
AggregateUDF.udaf
andAggregateUDF.from_pycapsule
methods in the DataFusion Python API lack proper type hinting and handling for CPythonPyCapsule
objects. This omission causes static type checking tools (e.g., mypy) to fail when users register UDAFs originating from external providers such asgeodatafusion
, even though the runtime behavior functions correctly.This PR addresses the gap by explicitly supporting PyCapsule types both in type hints and runtime checks. By doing so, it improves type safety, developer experience, and code clarity while maintaining full backward compatibility.
example from #1237
Before
After
What changes are included in this PR?
TypeGuard
function_is_pycapsule()
for lightweight PyCapsule type validation._PyCapsule
proxy class for static typing compatibility in non-type-checking contexts.AggregateUDF.__init__
andAggregateUDF.udaf()
to includeAggregateUDFExportable | _PyCapsule
argument types.AggregateUDF.from_pycapsule()
to support direct PyCapsule initialization.PyAggregateUDF::from_pycapsule()
logic to delegate PyCapsule validation to a new helper functionaggregate_udf_from_capsule()
for cleaner handling.Are these changes tested?
Yes:
Are there any user-facing changes?
Yes, minor improvements:
These changes are fully backward-compatible and non-breaking for existing user code.